A computational procedure to identify significant overlap of differentially expressed and genomic imbalanced regions in cancer datasets†

نویسندگان

  • Silvio Bicciato
  • Roberta Spinelli
  • Mattia Zampieri
  • Eleonora Mangano
  • Francesco Ferrari
  • Luca Beltrame
  • Ingrid Cifola
  • Clelia Peano
  • Aldo Solari
  • Cristina Battaglia
چکیده

The integration of high-throughput genomic data represents an opportunity for deciphering the interplay between structural and functional organization of genomes and for discovering novel biomarkers. However, the development of integrative approaches to complement gene expression (GE) data with other types of gene information, such as copy number (CN) and chromosomal localization, still represents a computational challenge in the genomic arena. This work presents a computational procedure that directly integrates CN and GE profiles at genome-wide level. When applied to DNA/RNA paired data, this approach leads to the identification of Significant Overlaps of Differentially Expressed and Genomic Imbalanced Regions (SODEGIR). This goal is accomplished in three steps. The first step extends to CN a method for detecting regional imbalances in GE. The second part provides the integration of CN and GE data and identifies chromosomal regions with concordantly altered genomic and transcriptional status in a tumor sample. The last step elevates the single-sample analysis to an entire dataset of tumor specimens. When applied to study chromosomal aberrations in a collection of astrocytoma and renal carcinoma samples, the procedure proved to be effective in identifying discrete chromosomal regions of coordinated CN alterations and changes in transcriptional levels.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Imbalanced Class Learning in Epigenetics

In machine learning, one of the important criteria for higher classification accuracy is a balanced dataset. Datasets with a large ratio between minority and majority classes face hindrance in learning using any classifier. Datasets having a magnitude difference in number of instances between the target concept result in an imbalanced class distribution. Such datasets can range from biological ...

متن کامل

The Application of a Non-Radioactive DD-AFLP Method for Profiling of Aeluropus lagopoides Differentially Expressed Transcripts under Salinity or Drought Conditions

Aeluropus lagopoides is a salt and drought tolerant grass from Poaceae family, distributed widely in arid regions. There is almost no information about the genetics or genome of this close relative of wheat that stands harsh conditions of deserts. Differential Display Amplified fragment length polymorphism (DD-AFLP) led to the improvement of a non-radioactive method for which many parameters we...

متن کامل

Integration of 198 ChIP-seq Datasets Reveals Human cis-Regulatory Regions

We analyzed 198 datasets of chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) and developed a methodology for identification of high-confidence enhancer and promoter regions from transcription factor ChIP-seq data alone. We identify 32,467 genomic regions marked with ChIP-seq binding peaks in 15 or more experiments as high-confidence cis-regulatory regions. Althoug...

متن کامل

Integrating Diverse Types of Genomic Data to Identify Genes that Underlie Adverse Pregnancy Phenotypes

Progress in understanding complex genetic diseases has been bolstered by synthetic approaches that overlay diverse data types and analyses to identify functionally important genes. Pre-term birth (PTB), a major complication of pregnancy, is a leading cause of infant mortality worldwide. A major obstacle in addressing PTB is that the mechanisms controlling parturition and birth timing remain poo...

متن کامل

تخمین مکان نواحی کدکننده پروتئین در توالی عددی DNA با استفاده پنجره با طول متغیر بر مبنای منحنی سه بعدی Z

In recent years, estimation of protein-coding regions in numerical deoxyribonucleic acid (DNA) sequences using signal processing tools has been a challenging issue in bioinformatics, owing to their 3-base periodicity. Several digital signal processing (DSP) tools have been applied in order to Identify the task and concentrated on assigning numerical values to the symbolic DNA sequence, then app...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 37  شماره 

صفحات  -

تاریخ انتشار 2009